Integrating Provenance into an Operational Data Product Information System
نویسندگان
چکیده
National Science Foundation under awards OCI-0968277 and OCI-0943761 Knowledge of how a science data product has been generated is a critical component to determining its fitness-for-use for a given analysis. One objective of science information systems is to allow users to search for data products based on a wide range of criteria; spatial and temporal extent, observed parameter, research domain, and organizational project are common search criteria. Currently, science information systems are geared towards helping users find data, but not in helping users determine how the products were generated. An information system that exposes the provenance of available data products, that is what observations, assumptions, and science processing were involved in the generation of the data products, would contribute significant benefit to user fitness-for-use decision-making.! ! In this work we discuss semantics-driven provenance extensions to the Virtual Solar Terrestrial Observatory (VSTO) information system. The VSTO semantic web portal uses an ontology to provide a unified search and product retrieval interface to data in the fields of solar, solar-terrestrial, and space physics. We have developed an extension to the VSTO ontology that allows it to express item-level data product records. We will show how the Open Provenance Model (OPM) and the Proof Markup Language (PML) can be used to express the provenance of data product records. Additionally, we will discuss ways in which domain semantics can aid in the formulation and answering of provenance queries. Our extension to the VSTO ontology has also been integrated with a solar-terrestrial profile of the Observation and Measurement (O&M) model to support domain-specific descriptions of solar-terrestrial observations; we utilize this integration to connect observation events to the data product record lineage.! ! Our additions to the VSTO ontology will allow us to extend the VSTO web portal user interface with search criteria based on provenance and observation characteristics. More critically, provenance information will allow the VSTO portal to display important knowledge about selected data records; what science processes and assumptions were applied to generate the record, what observations the record derives from, and the results of quality processing that had been applied to the record and any records it derives from. We conclude by showing our interface for showing record provenance information and discuss how it aids users in determining fitness-for-use of the data.! Abstract
منابع مشابه
The Case for Fine-Grained Stream Provenance
The current state of the art for provenance in data stream management systems (DSMS) is to provide provenance at a high level of abstraction (such as, from which sensors in a sensor network an aggregated value is derived from). This limitation was imposed by high-throughput requirements and an anticipated lack of application demand for more detailed provenance information. In this work, we firs...
متن کاملA Provenance Tracking Model for Data Updates
For data-centric systems, provenance tracking is particularly important when the system is open and decentralised, such as the Web of Linked Data. In this paper, a concise but expressive calculus which models data updates is presented. The calculus is used to provide an operational semantics for a system where data and updates interact concurrently. The operational semantics of the calculus als...
متن کاملSharing geospatial provenance in a service-oriented environment
One of the earliest investigations of provenance was inspired by applications in GIS in the early 1990’s. Provenance records the processing history of a data product. It provides an information context to help users determine the reliability of data products. Conventional provenance applications in GIS focus on provenance capture, representation, and usage in a stand-alone environment such as a...
متن کاملIssues in Building Practical Provenance Systems
The importance of maintaining provenance has been widely recognized, particularly with respect to highly-manipulated data. However, there are few deployed databases that provide provenance information with their data. We have constructed a database of protein interactions (MiMI), which is heavily used by biomedical scientists, by manipulating and integrating data from several popular biological...
متن کاملDistinguishing Provenance Equivalence of Earth Science Data
Reproducibility of scientific research relies on accurate and precise citation of data and the provenance of that data. Earth science data are often the result of applying complex data transformation and analysis workflows to vast quantities of data. Provenance information of data processing is used for a variety of purposes, including understanding the process and auditing as well as reproduci...
متن کامل